Overview
Brought to you by YData
Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 427 | 436 |
| Missing cells (%) | 8.0% | 8.1% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 4 |
| Categorical | 4 | 5 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Age has 89 (20.0%) missing values | Age has 93 (20.9%) missing values | Missing |
Cabin has 337 (75.6%) missing values | Cabin has 343 (76.9%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 296 (66.4%) zeros | SibSp has 310 (69.5%) zeros | Zeros |
Parch has 343 (76.9%) zeros | Alert not present in this dataset | Zeros |
Fare has 7 (1.6%) zeros | Fare has 5 (1.1%) zeros | Zeros |
| Alert not present in this dataset | Sex is highly overall correlated with Survived | High correlation |
| Alert not present in this dataset | Survived is highly overall correlated with Sex | High correlation |
| Alert not present in this dataset | Parch is highly imbalanced (51.8%) | Imbalance |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2025-03-21 10:30:33.770721 | 2025-03-21 10:30:35.912971 |
| Analysis finished | 2025-03-21 10:30:35.910162 | 2025-03-21 10:30:37.483535 |
| Duration | 2.14 seconds | 1.57 second |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
Variables
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 448.47982 | 454.15919 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| Maximum | 891 | 891 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| 5-th percentile | 44.5 | 38.75 |
| Q1 | 213 | 232.25 |
| median | 445 | 455.5 |
| Q3 | 689 | 689.5 |
| 95-th percentile | 859.5 | 852 |
| Maximum | 891 | 891 |
| Range | 890 | 890 |
| Interquartile range (IQR) | 476 | 457.25 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 265.00801 | 258.87042 |
| Coefficient of variation (CV) | 0.59090285 | 0.5699993 |
| Kurtosis | -1.2651205 | -1.2221746 |
| Mean | 448.47982 | 454.15919 |
| Median Absolute Deviation (MAD) | 236.5 | 229 |
| Skewness | -0.0033682188 | -0.027847904 |
| Sum | 200022 | 202555 |
| Variance | 70229.243 | 67013.896 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 293 | 1 | 0.2% |
| 361 | 1 | 0.2% |
| 278 | 1 | 0.2% |
| 776 | 1 | 0.2% |
| 674 | 1 | 0.2% |
| 627 | 1 | 0.2% |
| 868 | 1 | 0.2% |
| 125 | 1 | 0.2% |
| 682 | 1 | 0.2% |
| 8 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 397 | 1 | 0.2% |
| 279 | 1 | 0.2% |
| 159 | 1 | 0.2% |
| 555 | 1 | 0.2% |
| 32 | 1 | 0.2% |
| 414 | 1 | 0.2% |
| 491 | 1 | 0.2% |
| 589 | 1 | 0.2% |
| 540 | 1 | 0.2% |
| 542 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 3 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 16 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 3 | 1 | |
| 5 | 1 | |
| 8 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 14 | 1 | |
| 16 | 1 | |
| 18 | 1 | |
| 21 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 3 | 1 | |
| 5 | 1 | |
| 8 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 14 | 1 | |
| 16 | 1 | |
| 18 | 1 | |
| 21 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 3 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 16 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 0 | 0 |
| 2nd row | 0 | 0 |
| 3rd row | 0 | 1 |
| 4th row | 1 | 1 |
| 5th row | 0 | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 276 | |
| 1 | 170 |
| Value | Count | Frequency (%) |
| 0 | 281 | |
| 1 | 165 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 276 | |
| 1 | 170 |
| Value | Count | Frequency (%) |
| 0 | 281 | |
| 1 | 165 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 276 | |
| 1 | 170 |
| Value | Count | Frequency (%) |
| 0 | 281 | |
| 1 | 165 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 276 | |
| 1 | 170 |
| Value | Count | Frequency (%) |
| 0 | 281 | |
| 1 | 165 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 276 | |
| 1 | 170 |
| Value | Count | Frequency (%) |
| 0 | 281 | |
| 1 | 165 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 276 | |
| 1 | 170 |
| Value | Count | Frequency (%) |
| 0 | 281 | |
| 1 | 165 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 3 | 3 |
| 2nd row | 2 | 3 |
| 3rd row | 3 | 3 |
| 4th row | 2 | 1 |
| 5th row | 2 | 2 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 254 | |
| 1 | 104 | |
| 2 | 88 | 19.7% |
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 111 | |
| 2 | 98 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 254 | |
| 1 | 104 | |
| 2 | 88 | 19.7% |
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 111 | |
| 2 | 98 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 254 | |
| 1 | 104 | |
| 2 | 88 | 19.7% |
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 111 | |
| 2 | 98 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 254 | |
| 1 | 104 | |
| 2 | 88 | 19.7% |
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 111 | |
| 2 | 98 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 254 | |
| 1 | 104 | |
| 2 | 88 | 19.7% |
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 111 | |
| 2 | 98 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 254 | |
| 1 | 104 | |
| 2 | 88 | 19.7% |
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 111 | |
| 2 | 98 |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 57 | 67 |
| Median length | 48 | 50 |
| Mean length | 26.784753 | 26.701794 |
| Min length | 13 | 12 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Skoog, Mr. Wilhelm | Rice, Master. Eric |
| 2nd row | Parkes, Mr. Francis "Frank" | Smiljanic, Mr. Mile |
| 3rd row | Myhrman, Mr. Pehr Fabian Oliver Malkolm | Ohman, Miss. Velin |
| 4th row | Wilhelms, Mr. Charles | Spencer, Mrs. William Augustus (Marie Eugenie) |
| 5th row | Kirkland, Rev. Charles Leonard | Cunningham, Mr. Alfred Fleming |
| Value | Count | Frequency (%) |
| mr | 258 | 14.3% |
| miss | 96 | 5.3% |
| mrs | 62 | 3.4% |
| william | 30 | 1.7% |
| master | 20 | 1.1% |
| george | 15 | 0.8% |
| henry | 15 | 0.8% |
| john | 13 | 0.7% |
| james | 12 | 0.7% |
| anna | 11 | 0.6% |
| Other values (892) | 1271 |
| Value | Count | Frequency (%) |
| mr | 260 | 14.5% |
| miss | 95 | 5.3% |
| mrs | 61 | 3.4% |
| william | 33 | 1.8% |
| john | 23 | 1.3% |
| henry | 22 | 1.2% |
| master | 19 | 1.1% |
| charles | 15 | 0.8% |
| thomas | 15 | 0.8% |
| mary | 13 | 0.7% |
| Other values (876) | 1242 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1359 | 11.4% | |
| r | 968 | 8.1% |
| e | 853 | 7.1% |
| a | 825 | 6.9% |
| s | 656 | 5.5% |
| i | 654 | 5.5% |
| n | 647 | 5.4% |
| M | 554 | 4.6% |
| l | 526 | 4.4% |
| o | 518 | 4.3% |
| Other values (50) | 4386 |
| Value | Count | Frequency (%) |
| 1353 | 11.4% | |
| r | 962 | 8.1% |
| e | 838 | 7.0% |
| a | 818 | 6.9% |
| n | 666 | 5.6% |
| i | 654 | 5.5% |
| s | 636 | 5.3% |
| M | 544 | 4.6% |
| l | 532 | 4.5% |
| o | 503 | 4.2% |
| Other values (49) | 4403 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 11946 |
| Value | Count | Frequency (%) |
| (unknown) | 11909 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1359 | 11.4% | |
| r | 968 | 8.1% |
| e | 853 | 7.1% |
| a | 825 | 6.9% |
| s | 656 | 5.5% |
| i | 654 | 5.5% |
| n | 647 | 5.4% |
| M | 554 | 4.6% |
| l | 526 | 4.4% |
| o | 518 | 4.3% |
| Other values (50) | 4386 |
| Value | Count | Frequency (%) |
| 1353 | 11.4% | |
| r | 962 | 8.1% |
| e | 838 | 7.0% |
| a | 818 | 6.9% |
| n | 666 | 5.6% |
| i | 654 | 5.5% |
| s | 636 | 5.3% |
| M | 544 | 4.6% |
| l | 532 | 4.5% |
| o | 503 | 4.2% |
| Other values (49) | 4403 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 11946 |
| Value | Count | Frequency (%) |
| (unknown) | 11909 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1359 | 11.4% | |
| r | 968 | 8.1% |
| e | 853 | 7.1% |
| a | 825 | 6.9% |
| s | 656 | 5.5% |
| i | 654 | 5.5% |
| n | 647 | 5.4% |
| M | 554 | 4.6% |
| l | 526 | 4.4% |
| o | 518 | 4.3% |
| Other values (50) | 4386 |
| Value | Count | Frequency (%) |
| 1353 | 11.4% | |
| r | 962 | 8.1% |
| e | 838 | 7.0% |
| a | 818 | 6.9% |
| n | 666 | 5.6% |
| i | 654 | 5.5% |
| s | 636 | 5.3% |
| M | 544 | 4.6% |
| l | 532 | 4.5% |
| o | 503 | 4.2% |
| Other values (49) | 4403 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 11946 |
| Value | Count | Frequency (%) |
| (unknown) | 11909 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1359 | 11.4% | |
| r | 968 | 8.1% |
| e | 853 | 7.1% |
| a | 825 | 6.9% |
| s | 656 | 5.5% |
| i | 654 | 5.5% |
| n | 647 | 5.4% |
| M | 554 | 4.6% |
| l | 526 | 4.4% |
| o | 518 | 4.3% |
| Other values (50) | 4386 |
| Value | Count | Frequency (%) |
| 1353 | 11.4% | |
| r | 962 | 8.1% |
| e | 838 | 7.0% |
| a | 818 | 6.9% |
| n | 666 | 5.6% |
| i | 654 | 5.5% |
| s | 636 | 5.3% |
| M | 544 | 4.6% |
| l | 532 | 4.5% |
| o | 503 | 4.2% |
| Other values (49) | 4403 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.7219731 | 4.7085202 |
| Min length | 4 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | male | male |
| 2nd row | male | male |
| 3rd row | male | female |
| 4th row | male | female |
| 5th row | male | male |
Common Values
| Value | Count | Frequency (%) |
| male | 285 | |
| female | 161 |
| Value | Count | Frequency (%) |
| male | 288 | |
| female | 158 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 285 | |
| female | 161 |
| Value | Count | Frequency (%) |
| male | 288 | |
| female | 158 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2106 |
| Value | Count | Frequency (%) |
| (unknown) | 2100 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2106 |
| Value | Count | Frequency (%) |
| (unknown) | 2100 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2106 |
| Value | Count | Frequency (%) |
| (unknown) | 2100 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 74 | 73 |
| Distinct (%) | 20.7% | 20.7% |
| Missing | 89 | 93 |
| Missing (%) | 20.0% | 20.9% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 30.017983 | 30.095382 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| Maximum | 80 | 70 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| 5-th percentile | 4 | 6.6 |
| Q1 | 21 | 21 |
| median | 29 | 28 |
| Q3 | 39 | 38 |
| 95-th percentile | 54 | 56 |
| Maximum | 80 | 70 |
| Range | 79.58 | 69.58 |
| Interquartile range (IQR) | 18 | 17 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.348467 | 13.849738 |
| Coefficient of variation (CV) | 0.47799571 | 0.46019478 |
| Kurtosis | 0.22239404 | -0.075073147 |
| Mean | 30.017983 | 30.095382 |
| Median Absolute Deviation (MAD) | 8 | 8 |
| Skewness | 0.32565827 | 0.29994218 |
| Sum | 10716.42 | 10623.67 |
| Variance | 205.87851 | 191.81524 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 25 | 15 | 3.4% |
| 22 | 14 | 3.1% |
| 30 | 13 | 2.9% |
| 27 | 13 | 2.9% |
| 31 | 12 | 2.7% |
| 29 | 12 | 2.7% |
| 21 | 11 | 2.5% |
| 19 | 11 | 2.5% |
| 32 | 11 | 2.5% |
| 36 | 11 | 2.5% |
| Other values (64) | 234 | |
| (Missing) | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 24 | 19 | 4.3% |
| 30 | 16 | 3.6% |
| 22 | 15 | 3.4% |
| 25 | 14 | 3.1% |
| 21 | 13 | 2.9% |
| 28 | 13 | 2.9% |
| 35 | 12 | 2.7% |
| 19 | 11 | 2.5% |
| 29 | 10 | 2.2% |
| 20 | 10 | 2.2% |
| Other values (63) | 220 | |
| (Missing) | 93 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 2 | 0.4% |
| 2 | 5 | |
| 3 | 3 | |
| 4 | 7 | |
| 5 | 2 | 0.4% |
| 6 | 2 | 0.4% |
| 7 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 2 | |
| 2 | 4 | |
| 3 | 2 | |
| 4 | 4 | |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 2 | |
| 2 | 4 | |
| 3 | 2 | |
| 4 | 4 | |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 2 | 0.4% |
| 2 | 5 | |
| 3 | 3 | |
| 4 | 7 | |
| 5 | 2 | 0.4% |
| 6 | 2 | 0.4% |
| 7 | 1 | 0.2% |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.5470852 | 0.58744395 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 296 | 310 |
| Zeros (%) | 66.4% | 69.5% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 2.75 | 3 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.1182965 | 1.324013 |
| Coefficient of variation (CV) | 2.0440993 | 2.2538543 |
| Kurtosis | 18.378262 | 15.066292 |
| Mean | 0.5470852 | 0.58744395 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.7213771 | 3.5973032 |
| Sum | 244 | 262 |
| Variance | 1.250587 | 1.7530105 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 1 | 111 | 24.9% |
| 2 | 16 | 3.6% |
| 3 | 9 | 2.0% |
| 4 | 8 | 1.8% |
| 8 | 4 | 0.9% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 310 | |
| 1 | 93 | 20.9% |
| 2 | 13 | 2.9% |
| 4 | 10 | 2.2% |
| 3 | 9 | 2.0% |
| 8 | 7 | 1.6% |
| 5 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 1 | 111 | 24.9% |
| 2 | 16 | 3.6% |
| 3 | 9 | 2.0% |
| 4 | 8 | 1.8% |
| 5 | 2 | 0.4% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 310 | |
| 1 | 93 | 20.9% |
| 2 | 13 | 2.9% |
| 3 | 9 | 2.0% |
| 4 | 10 | 2.2% |
| 5 | 4 | 0.9% |
| 8 | 7 | 1.6% |
| Value | Count | Frequency (%) |
| 0 | 310 | |
| 1 | 93 | 20.9% |
| 2 | 13 | 2.9% |
| 3 | 9 | 2.0% |
| 4 | 10 | 2.2% |
| 5 | 4 | 0.9% |
| 8 | 7 | 1.6% |
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 1 | 111 | 24.9% |
| 2 | 16 | 3.6% |
| 3 | 9 | 2.0% |
| 4 | 8 | 1.8% |
| 5 | 2 | 0.4% |
| 8 | 4 | 0.9% |
Parch
Numeric
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 5 |
| Distinct (%) | 1.6% | 1.1% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 | |
| 2 | |
| 5 | 4 |
| 4 | 3 |
| Other values (2) | 3 |
| 0 | |
|---|---|
| 1 | |
| 2 | |
| 3 | 4 |
| 5 | 3 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 1 | 0 ? |
| Unique (%) | 0.2% | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 343 | |
| 1 | 58 | 13.0% |
| 2 | 35 | 7.8% |
| 5 | 4 | 0.9% |
| 4 | 3 | 0.7% |
| 3 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 338 | |
| 1 | 60 | 13.5% |
| 2 | 41 | 9.2% |
| 3 | 4 | 0.9% |
| 5 | 3 | 0.7% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 338 | |
| 1 | 60 | 13.5% |
| 2 | 41 | 9.2% |
| 3 | 4 | 0.9% |
| 5 | 3 | 0.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 338 | |
| 1 | 60 | 13.5% |
| 2 | 41 | 9.2% |
| 3 | 4 | 0.9% |
| 5 | 3 | 0.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 338 | |
| 1 | 60 | 13.5% |
| 2 | 41 | 9.2% |
| 3 | 4 | 0.9% |
| 5 | 3 | 0.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 338 | |
| 1 | 60 | 13.5% |
| 2 | 41 | 9.2% |
| 3 | 4 | 0.9% |
| 5 | 3 | 0.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 338 | |
| 1 | 60 | 13.5% |
| 2 | 41 | 9.2% |
| 3 | 4 | 0.9% |
| 5 | 3 | 0.7% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 378 | 371 |
| Distinct (%) | 84.8% | 83.2% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.6412556 | 6.7533632 |
| Min length | 3 | 3 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 331 | 316 ? |
| Unique (%) | 74.2% | 70.9% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 347088 | 382652 |
| 2nd row | 239853 | 315037 |
| 3rd row | 347078 | 347085 |
| 4th row | 244270 | PC 17569 |
| 5th row | 219533 | 239853 |
| Value | Count | Frequency (%) |
| pc | 26 | 4.7% |
| c.a | 12 | 2.2% |
| ca | 8 | 1.5% |
| a/5 | 7 | 1.3% |
| w./c | 5 | 0.9% |
| 347082 | 5 | 0.9% |
| sc/paris | 4 | 0.7% |
| soton/o.q | 4 | 0.7% |
| ston/o | 4 | 0.7% |
| 2 | 4 | 0.7% |
| Other values (395) | 472 |
| Value | Count | Frequency (%) |
| pc | 34 | 6.0% |
| ca | 12 | 2.1% |
| c.a | 12 | 2.1% |
| a/5 | 10 | 1.8% |
| 2343 | 7 | 1.2% |
| 2 | 6 | 1.1% |
| w./c | 6 | 1.1% |
| ston/o | 6 | 1.1% |
| sc/paris | 5 | 0.9% |
| soton/oq | 4 | 0.7% |
| Other values (387) | 469 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 385 | |
| 1 | 339 | |
| 2 | 302 | |
| 7 | 246 | |
| 4 | 237 | |
| 6 | 209 | 7.1% |
| 0 | 204 | 6.9% |
| 5 | 193 | 6.5% |
| 9 | 160 | 5.4% |
| 8 | 142 | 4.8% |
| Other values (22) | 545 |
| Value | Count | Frequency (%) |
| 3 | 367 | |
| 1 | 337 | |
| 2 | 300 | |
| 7 | 243 | |
| 4 | 233 | 7.7% |
| 6 | 225 | 7.5% |
| 5 | 207 | 6.9% |
| 0 | 196 | 6.5% |
| 9 | 175 | 5.8% |
| 8 | 128 | 4.2% |
| Other values (21) | 601 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2962 |
| Value | Count | Frequency (%) |
| (unknown) | 3012 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 385 | |
| 1 | 339 | |
| 2 | 302 | |
| 7 | 246 | |
| 4 | 237 | |
| 6 | 209 | 7.1% |
| 0 | 204 | 6.9% |
| 5 | 193 | 6.5% |
| 9 | 160 | 5.4% |
| 8 | 142 | 4.8% |
| Other values (22) | 545 |
| Value | Count | Frequency (%) |
| 3 | 367 | |
| 1 | 337 | |
| 2 | 300 | |
| 7 | 243 | |
| 4 | 233 | 7.7% |
| 6 | 225 | 7.5% |
| 5 | 207 | 6.9% |
| 0 | 196 | 6.5% |
| 9 | 175 | 5.8% |
| 8 | 128 | 4.2% |
| Other values (21) | 601 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2962 |
| Value | Count | Frequency (%) |
| (unknown) | 3012 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 385 | |
| 1 | 339 | |
| 2 | 302 | |
| 7 | 246 | |
| 4 | 237 | |
| 6 | 209 | 7.1% |
| 0 | 204 | 6.9% |
| 5 | 193 | 6.5% |
| 9 | 160 | 5.4% |
| 8 | 142 | 4.8% |
| Other values (22) | 545 |
| Value | Count | Frequency (%) |
| 3 | 367 | |
| 1 | 337 | |
| 2 | 300 | |
| 7 | 243 | |
| 4 | 233 | 7.7% |
| 6 | 225 | 7.5% |
| 5 | 207 | 6.9% |
| 0 | 196 | 6.5% |
| 9 | 175 | 5.8% |
| 8 | 128 | 4.2% |
| Other values (21) | 601 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2962 |
| Value | Count | Frequency (%) |
| (unknown) | 3012 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 385 | |
| 1 | 339 | |
| 2 | 302 | |
| 7 | 246 | |
| 4 | 237 | |
| 6 | 209 | 7.1% |
| 0 | 204 | 6.9% |
| 5 | 193 | 6.5% |
| 9 | 160 | 5.4% |
| 8 | 142 | 4.8% |
| Other values (22) | 545 |
| Value | Count | Frequency (%) |
| 3 | 367 | |
| 1 | 337 | |
| 2 | 300 | |
| 7 | 243 | |
| 4 | 233 | 7.7% |
| 6 | 225 | 7.5% |
| 5 | 207 | 6.9% |
| 0 | 196 | 6.5% |
| 9 | 175 | 5.8% |
| 8 | 128 | 4.2% |
| Other values (21) | 601 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 176 | 172 |
| Distinct (%) | 39.5% | 38.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 30.405914 | 34.22838 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 7 | 5 |
| Zeros (%) | 1.6% | 1.1% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.225 | 7.225 |
| Q1 | 7.925 | 7.8958 |
| median | 14.5 | 14.5 |
| Q3 | 30.5 | 32.596875 |
| 95-th percentile | 92.8948 | 110.8833 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 22.575 | 24.701075 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 43.886341 | 53.895366 |
| Coefficient of variation (CV) | 1.4433489 | 1.5745812 |
| Kurtosis | 37.757534 | 31.178494 |
| Mean | 30.405914 | 34.22838 |
| Median Absolute Deviation (MAD) | 7.25 | 7.25 |
| Skewness | 4.8692666 | 4.7270986 |
| Sum | 13561.038 | 15265.858 |
| Variance | 1926.011 | 2904.7105 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 8.05 | 23 | 5.2% |
| 13 | 21 | 4.7% |
| 26 | 17 | 3.8% |
| 7.8958 | 17 | 3.8% |
| 7.75 | 14 | 3.1% |
| 7.2292 | 12 | 2.7% |
| 7.925 | 11 | 2.5% |
| 7.8542 | 9 | 2.0% |
| 26.55 | 8 | 1.8% |
| 10.5 | 8 | 1.8% |
| Other values (166) | 306 |
| Value | Count | Frequency (%) |
| 8.05 | 22 | 4.9% |
| 13 | 22 | 4.9% |
| 7.8958 | 20 | 4.5% |
| 7.75 | 16 | 3.6% |
| 26 | 16 | 3.6% |
| 10.5 | 13 | 2.9% |
| 7.2292 | 9 | 2.0% |
| 26.55 | 9 | 2.0% |
| 7.25 | 9 | 2.0% |
| 7.775 | 8 | 1.8% |
| Other values (162) | 302 |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 5 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.8583 | 1 | 0.2% |
| 6.975 | 2 | 0.4% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | |
| 7.0542 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 5 | |
| 4.0125 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.975 | 2 | 0.4% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 4 |
| Value | Count | Frequency (%) |
| 0 | 5 | |
| 4.0125 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.975 | 2 | 0.4% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 4 |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 5 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.8583 | 1 | 0.2% |
| 6.975 | 2 | 0.4% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | |
| 7.0542 | 1 | 0.2% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 91 | 87 |
| Distinct (%) | 83.5% | 84.5% |
| Missing | 337 | 343 |
| Missing (%) | 75.6% | 76.9% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.412844 | 3.6990291 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 74 | 73 ? |
| Unique (%) | 67.9% | 70.9% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | A24 | B78 |
| 2nd row | D26 | B39 |
| 3rd row | D49 | C46 |
| 4th row | F E69 | B41 |
| 5th row | E101 | D56 |
| Value | Count | Frequency (%) |
| f | 4 | 3.2% |
| e101 | 3 | 2.4% |
| d36 | 2 | 1.6% |
| g73 | 2 | 1.6% |
| b96 | 2 | 1.6% |
| b98 | 2 | 1.6% |
| c2 | 2 | 1.6% |
| e121 | 2 | 1.6% |
| c92 | 2 | 1.6% |
| f33 | 2 | 1.6% |
| Other values (92) | 101 |
| Value | Count | Frequency (%) |
| f33 | 3 | 2.5% |
| c23 | 3 | 2.5% |
| c25 | 3 | 2.5% |
| c27 | 3 | 2.5% |
| d17 | 2 | 1.6% |
| d20 | 2 | 1.6% |
| b35 | 2 | 1.6% |
| b57 | 2 | 1.6% |
| b59 | 2 | 1.6% |
| b63 | 2 | 1.6% |
| Other values (88) | 98 |
Most occurring characters
| Value | Count | Frequency (%) |
| B | 36 | 9.7% |
| 1 | 34 | 9.1% |
| 2 | 28 | 7.5% |
| 3 | 27 | 7.3% |
| C | 26 | 7.0% |
| 6 | 25 | 6.7% |
| 4 | 22 | 5.9% |
| D | 21 | 5.6% |
| 5 | 21 | 5.6% |
| 9 | 20 | 5.4% |
| Other values (8) | 112 |
| Value | Count | Frequency (%) |
| C | 39 | |
| B | 36 | 9.4% |
| 3 | 35 | 9.2% |
| 1 | 32 | 8.4% |
| 2 | 31 | 8.1% |
| 5 | 28 | 7.3% |
| 6 | 23 | 6.0% |
| 7 | 21 | 5.5% |
| D | 20 | 5.2% |
| 8 | 19 | 5.0% |
| Other values (8) | 97 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 372 |
| Value | Count | Frequency (%) |
| (unknown) | 381 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| B | 36 | 9.7% |
| 1 | 34 | 9.1% |
| 2 | 28 | 7.5% |
| 3 | 27 | 7.3% |
| C | 26 | 7.0% |
| 6 | 25 | 6.7% |
| 4 | 22 | 5.9% |
| D | 21 | 5.6% |
| 5 | 21 | 5.6% |
| 9 | 20 | 5.4% |
| Other values (8) | 112 |
| Value | Count | Frequency (%) |
| C | 39 | |
| B | 36 | 9.4% |
| 3 | 35 | 9.2% |
| 1 | 32 | 8.4% |
| 2 | 31 | 8.1% |
| 5 | 28 | 7.3% |
| 6 | 23 | 6.0% |
| 7 | 21 | 5.5% |
| D | 20 | 5.2% |
| 8 | 19 | 5.0% |
| Other values (8) | 97 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 372 |
| Value | Count | Frequency (%) |
| (unknown) | 381 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| B | 36 | 9.7% |
| 1 | 34 | 9.1% |
| 2 | 28 | 7.5% |
| 3 | 27 | 7.3% |
| C | 26 | 7.0% |
| 6 | 25 | 6.7% |
| 4 | 22 | 5.9% |
| D | 21 | 5.6% |
| 5 | 21 | 5.6% |
| 9 | 20 | 5.4% |
| Other values (8) | 112 |
| Value | Count | Frequency (%) |
| C | 39 | |
| B | 36 | 9.4% |
| 3 | 35 | 9.2% |
| 1 | 32 | 8.4% |
| 2 | 31 | 8.1% |
| 5 | 28 | 7.3% |
| 6 | 23 | 6.0% |
| 7 | 21 | 5.5% |
| D | 20 | 5.2% |
| 8 | 19 | 5.0% |
| Other values (8) | 97 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 372 |
| Value | Count | Frequency (%) |
| (unknown) | 381 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| B | 36 | 9.7% |
| 1 | 34 | 9.1% |
| 2 | 28 | 7.5% |
| 3 | 27 | 7.3% |
| C | 26 | 7.0% |
| 6 | 25 | 6.7% |
| 4 | 22 | 5.9% |
| D | 21 | 5.6% |
| 5 | 21 | 5.6% |
| 9 | 20 | 5.4% |
| Other values (8) | 112 |
| Value | Count | Frequency (%) |
| C | 39 | |
| B | 36 | 9.4% |
| 3 | 35 | 9.2% |
| 1 | 32 | 8.4% |
| 2 | 31 | 8.1% |
| 5 | 28 | 7.3% |
| 6 | 23 | 6.0% |
| 7 | 21 | 5.5% |
| D | 20 | 5.2% |
| 8 | 19 | 5.0% |
| Other values (8) | 97 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 1 | 0 |
| Missing (%) | 0.2% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | Q |
| 2nd row | S | S |
| 3rd row | S | S |
| 4th row | S | C |
| 5th row | Q | S |
Common Values
| Value | Count | Frequency (%) |
| S | 327 | |
| C | 80 | 17.9% |
| Q | 38 | 8.5% |
| (Missing) | 1 | 0.2% |
| Value | Count | Frequency (%) |
| S | 312 | |
| C | 95 | 21.3% |
| Q | 39 | 8.7% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 327 | |
| c | 80 | 18.0% |
| q | 38 | 8.5% |
| Value | Count | Frequency (%) |
| s | 312 | |
| c | 95 | 21.3% |
| q | 39 | 8.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 327 | |
| C | 80 | 18.0% |
| Q | 38 | 8.5% |
| Value | Count | Frequency (%) |
| S | 312 | |
| C | 95 | 21.3% |
| Q | 39 | 8.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| S | 327 | |
| C | 80 | 18.0% |
| Q | 38 | 8.5% |
| Value | Count | Frequency (%) |
| S | 312 | |
| C | 95 | 21.3% |
| Q | 39 | 8.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| S | 327 | |
| C | 80 | 18.0% |
| Q | 38 | 8.5% |
| Value | Count | Frequency (%) |
| S | 312 | |
| C | 95 | 21.3% |
| Q | 39 | 8.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| S | 327 | |
| C | 80 | 18.0% |
| Q | 38 | 8.5% |
| Value | Count | Frequency (%) |
| S | 312 | |
| C | 95 | 21.3% |
| Q | 39 | 8.7% |
Interactions
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Interaction plot not present for dataset
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Interaction plot not present for dataset
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Interaction plot not present for dataset
Dataset A
Dataset B
Interaction plot not present for dataset
Dataset A
Dataset B
Interaction plot not present for dataset
Dataset A
Dataset B
Interaction plot not present for dataset
Dataset A
Dataset B
Interaction plot not present for dataset
Dataset A
Dataset B
Interaction plot not present for dataset
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Interaction plot not present for dataset
Correlations
Dataset A
Dataset B
Dataset A
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.030 | 0.168 | -0.239 | 0.028 | 0.251 | 0.084 | -0.156 | 0.193 |
| Embarked | 0.030 | 1.000 | 0.143 | 0.000 | 0.000 | 0.226 | 0.048 | 0.000 | 0.123 |
| Fare | 0.168 | 0.143 | 1.000 | 0.395 | 0.008 | 0.456 | 0.203 | 0.419 | 0.285 |
| Parch | -0.239 | 0.000 | 0.395 | 1.000 | -0.012 | 0.000 | 0.270 | 0.440 | 0.131 |
| PassengerId | 0.028 | 0.000 | 0.008 | -0.012 | 1.000 | 0.000 | 0.000 | -0.067 | 0.037 |
| Pclass | 0.251 | 0.226 | 0.456 | 0.000 | 0.000 | 1.000 | 0.077 | 0.122 | 0.362 |
| Sex | 0.084 | 0.048 | 0.203 | 0.270 | 0.000 | 0.077 | 1.000 | 0.193 | 0.480 |
| SibSp | -0.156 | 0.000 | 0.419 | 0.440 | -0.067 | 0.122 | 0.193 | 1.000 | 0.188 |
| Survived | 0.193 | 0.123 | 0.285 | 0.131 | 0.037 | 0.362 | 0.480 | 0.188 | 1.000 |
Dataset B
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.055 | 0.135 | 0.324 | -0.012 | 0.299 | 0.018 | -0.194 | 0.102 |
| Embarked | 0.055 | 1.000 | 0.210 | 0.012 | 0.050 | 0.297 | 0.087 | 0.114 | 0.192 |
| Fare | 0.135 | 0.210 | 1.000 | 0.162 | -0.019 | 0.480 | 0.199 | 0.462 | 0.274 |
| Parch | 0.324 | 0.012 | 0.162 | 1.000 | 0.056 | 0.000 | 0.223 | 0.341 | 0.140 |
| PassengerId | -0.012 | 0.050 | -0.019 | 0.056 | 1.000 | 0.000 | 0.089 | -0.062 | 0.122 |
| Pclass | 0.299 | 0.297 | 0.480 | 0.000 | 0.000 | 1.000 | 0.184 | 0.163 | 0.364 |
| Sex | 0.018 | 0.087 | 0.199 | 0.223 | 0.089 | 0.184 | 1.000 | 0.157 | 0.582 |
| SibSp | -0.194 | 0.114 | 0.462 | 0.341 | -0.062 | 0.163 | 0.157 | 1.000 | 0.181 |
| Survived | 0.102 | 0.192 | 0.274 | 0.140 | 0.122 | 0.364 | 0.582 | 0.181 | 1.000 |
Missing values
Dataset A
A simple visualization of nullity by column.
Dataset B
A simple visualization of nullity by column.
Dataset A
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset B
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset A
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Dataset B
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Sample
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 360 | 361 | 0 | 3 | Skoog, Mr. Wilhelm | male | 40.0 | 1 | 4 | 347088 | 27.9000 | NaN | S |
| 277 | 278 | 0 | 2 | Parkes, Mr. Francis "Frank" | male | NaN | 0 | 0 | 239853 | 0.0000 | NaN | S |
| 775 | 776 | 0 | 3 | Myhrman, Mr. Pehr Fabian Oliver Malkolm | male | 18.0 | 0 | 0 | 347078 | 7.7500 | NaN | S |
| 673 | 674 | 1 | 2 | Wilhelms, Mr. Charles | male | 31.0 | 0 | 0 | 244270 | 13.0000 | NaN | S |
| 626 | 627 | 0 | 2 | Kirkland, Rev. Charles Leonard | male | 57.0 | 0 | 0 | 219533 | 12.3500 | NaN | Q |
| 867 | 868 | 0 | 1 | Roebling, Mr. Washington Augustus II | male | 31.0 | 0 | 0 | PC 17590 | 50.4958 | A24 | S |
| 124 | 125 | 0 | 1 | White, Mr. Percival Wayland | male | 54.0 | 0 | 1 | 35281 | 77.2875 | D26 | S |
| 681 | 682 | 1 | 1 | Hassab, Mr. Hammad | male | 27.0 | 0 | 0 | PC 17572 | 76.7292 | D49 | C |
| 7 | 8 | 0 | 3 | Palsson, Master. Gosta Leonard | male | 2.0 | 3 | 1 | 349909 | 21.0750 | NaN | S |
| 854 | 855 | 0 | 2 | Carter, Mrs. Ernest Courtenay (Lilian Hughes) | female | 44.0 | 1 | 0 | 244252 | 26.0000 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 278 | 279 | 0 | 3 | Rice, Master. Eric | male | 7.0 | 4 | 1 | 382652 | 29.1250 | NaN | Q |
| 158 | 159 | 0 | 3 | Smiljanic, Mr. Mile | male | NaN | 0 | 0 | 315037 | 8.6625 | NaN | S |
| 554 | 555 | 1 | 3 | Ohman, Miss. Velin | female | 22.0 | 0 | 0 | 347085 | 7.7750 | NaN | S |
| 31 | 32 | 1 | 1 | Spencer, Mrs. William Augustus (Marie Eugenie) | female | NaN | 1 | 0 | PC 17569 | 146.5208 | B78 | C |
| 413 | 414 | 0 | 2 | Cunningham, Mr. Alfred Fleming | male | NaN | 0 | 0 | 239853 | 0.0000 | NaN | S |
| 490 | 491 | 0 | 3 | Hagland, Mr. Konrad Mathias Reiersen | male | NaN | 1 | 0 | 65304 | 19.9667 | NaN | S |
| 588 | 589 | 0 | 3 | Gilinski, Mr. Eliezer | male | 22.0 | 0 | 0 | 14973 | 8.0500 | NaN | S |
| 539 | 540 | 1 | 1 | Frolicher, Miss. Hedwig Margaritha | female | 22.0 | 0 | 2 | 13568 | 49.5000 | B39 | C |
| 541 | 542 | 0 | 3 | Andersson, Miss. Ingeborg Constanzia | female | 9.0 | 4 | 2 | 347082 | 31.2750 | NaN | S |
| 741 | 742 | 0 | 1 | Cavendish, Mr. Tyrell William | male | 36.0 | 1 | 0 | 19877 | 78.8500 | C46 | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 729 | 730 | 0 | 3 | Ilmakangas, Miss. Pieta Sofia | female | 25.00 | 1 | 0 | STON/O2. 3101271 | 7.9250 | NaN | S |
| 206 | 207 | 0 | 3 | Backstrom, Mr. Karl Alfred | male | 32.00 | 1 | 0 | 3101278 | 15.8500 | NaN | S |
| 318 | 319 | 1 | 1 | Wick, Miss. Mary Natalie | female | 31.00 | 0 | 2 | 36928 | 164.8667 | C7 | S |
| 440 | 441 | 1 | 2 | Hart, Mrs. Benjamin (Esther Ada Bloomfield) | female | 45.00 | 1 | 1 | F.C.C. 13529 | 26.2500 | NaN | S |
| 871 | 872 | 1 | 1 | Beckwith, Mrs. Richard Leonard (Sallie Monypeny) | female | 47.00 | 1 | 1 | 11751 | 52.5542 | D35 | S |
| 174 | 175 | 0 | 1 | Smith, Mr. James Clinch | male | 56.00 | 0 | 0 | 17764 | 30.6958 | A7 | C |
| 755 | 756 | 1 | 2 | Hamalainen, Master. Viljo | male | 0.67 | 1 | 1 | 250649 | 14.5000 | NaN | S |
| 640 | 641 | 0 | 3 | Jensen, Mr. Hans Peder | male | 20.00 | 0 | 0 | 350050 | 7.8542 | NaN | S |
| 603 | 604 | 0 | 3 | Torber, Mr. Ernst William | male | 44.00 | 0 | 0 | 364511 | 8.0500 | NaN | S |
| 292 | 293 | 0 | 2 | Levy, Mr. Rene Jacques | male | 36.00 | 0 | 0 | SC/Paris 2163 | 12.8750 | D | C |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 714 | 715 | 0 | 2 | Greenberg, Mr. Samuel | male | 52.0 | 0 | 0 | 250647 | 13.0000 | NaN | S |
| 749 | 750 | 0 | 3 | Connaghton, Mr. Michael | male | 31.0 | 0 | 0 | 335097 | 7.7500 | NaN | Q |
| 839 | 840 | 1 | 1 | Marechal, Mr. Pierre | male | NaN | 0 | 0 | 11774 | 29.7000 | C47 | C |
| 503 | 504 | 0 | 3 | Laitinen, Miss. Kristina Sofia | female | 37.0 | 0 | 0 | 4135 | 9.5875 | NaN | S |
| 584 | 585 | 0 | 3 | Paulner, Mr. Uscher | male | NaN | 0 | 0 | 3411 | 8.7125 | NaN | C |
| 778 | 779 | 0 | 3 | Kilgannon, Mr. Thomas J | male | NaN | 0 | 0 | 36865 | 7.7375 | NaN | Q |
| 693 | 694 | 0 | 3 | Saad, Mr. Khalil | male | 25.0 | 0 | 0 | 2672 | 7.2250 | NaN | C |
| 672 | 673 | 0 | 2 | Mitchell, Mr. Henry Michael | male | 70.0 | 0 | 0 | C.A. 24580 | 10.5000 | NaN | S |
| 454 | 455 | 0 | 3 | Peduzzi, Mr. Joseph | male | NaN | 0 | 0 | A/5 2817 | 8.0500 | NaN | S |
| 396 | 397 | 0 | 3 | Olsson, Miss. Elina | female | 31.0 | 0 | 0 | 350407 | 7.8542 | NaN | S |
Duplicate rows
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||